Goto

Collaborating Authors

 sinkhorn dro


Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization

arXiv.org Machine Learning

Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized Wasserstein distance, referred to as Sinkhorn DRO. Existing work primarily addresses Sinkhorn DRO from a dual perspective, leveraging its formulation as a conditional stochastic optimization problem, for which many stochastic gradient methods are applicable. However, the theoretical analyses of such methods often rely on the boundedness of the loss function, and it is indirect to obtain the worst-case distribution associated with Sinkhorn DRO. In contrast, we study Sinkhorn DRO from the primal perspective, by reformulating it as a bilevel program with several infinite-dimensional lower-level subproblems over probability space. This formulation enables us to simultaneously obtain the optimal robust decision and the worst-case distribution, which is valuable in practical settings, such as generating stress-test scenarios or designing robust learning algorithms. We propose both double-loop and single-loop sampling-based algorithms with theoretical guarantees to solve this bilevel program. Finally, we demonstrate the effectiveness of our approach through a numerical study on adversarial classification.


Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization

arXiv.org Machine Learning

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.


A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets

arXiv.org Machine Learning

Hypothesis testing for small-sample scenarios is a practically important problem. In this paper, we investigate the robust hypothesis testing problem in a data-driven manner, where we seek the worst-case detector over distributional uncertainty sets centered around the empirical distribution from samples using Sinkhorn distance. Compared with the Wasserstein robust test, the corresponding least favorable distributions are supported beyond the training samples, which provides a more flexible detector. Various numerical experiments are conducted on both synthetic and real datasets to validate the competitive performances of our proposed method. As a fundamental problem in statistics, hypothesis testing plays a key role in general scientific discovery areas such as anomaly detection and model criticism. The goal of hypothesis testing is to determine which one among given hypotheses is true within a certain error probability level.


Sinkhorn Distributionally Robust Optimization

arXiv.org Machine Learning

Decision-making problems under uncertainty have broad applications in operations research, machine learning, engineering, and economics. When the data involves uncertainty due to measurement error, insufficient sample size, contamination, and anomalies, or model misspecification, distributionally robust optimization (DRO) is a promising approach to data-driven optimization, by seeking a minimax robust optimal decision that minimizes the expected loss under the most adverse distribution within a given set of relevant distributions, called ambiguity set. It provides a principled framework to produce a solution with more promising out-of-sample performance than the traditional sample average approximation (SAA) method for stochastic programming [86]. We refer to [81] for a recent survey on DRO. At the core of DRO is the choice of the ambiguity set. Ideally, a good ambiguity set should take account of the properties of practical applications while maintaining the computational tractability of resulted DRO formulation; and it should be rich enough to contain all distributions relevant to the decision-making but, at the same time, should not include unnecessary distributions that lead to overly conservative decisions. Various DRO formulations have been proposed in the literature. Among them, the ambiguity set based on Wasserstein distance has recently received much attention [104, 67, 17, 46]. The Wasserstein distance incorporates the geometry of sample space, and thereby is suitable for comparing distributions with non-overlapping supports and hedging against data perturbations [46].